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Abstract. Despite the abundance of methods for variable selection and ac- 
commodating spatial structure in regression models, there is little precedent 
for incorporating spatial dependence in covariate inclusion probabilities for re- 
gionally varying regression models. The lone existing approach is limited by 
difficult computation and the requirement that the spatial dependence be rep- 
resented on a lattice, making this method inappropriate for areal models with 
irregular structures that often arise in ecology, epidemiology, and the social 
sciences. Here we present a novel method for spatial variable selection in areal 
generalized linear models that can accommodate arbitrary spatial structures 
and works with a broad subset of GLM likelihoods. The method uses a latent 
probit model with a spatial dependence structure where the binary response 
is taken as a covariate inclusion indicator for area-specific GLMs. The covari- 
ate inclusion indicators arise via thresholding of latent standard normals on 
which we place a conditionally autoregressive prior. We propose an efficient 
MCMC algorithm for computation that is entirely conjugate in any model with 
a conditionally Gaussian representation of the likelihood, thereby encompass- 
ing logistic, probit, multinomial probit and logit, Gaussian, and negative bino- 
mial regressions through the use of existing data augmentation methods. We 
demonstrate superior parameter recovery and prediction in simulation stud- 
ies as well as in applications to geographic voting patterns and population 
estimation. Though the method is very broadly applicable, we note in partic- 
ular that prior to this work, spatial population estimation/capture-recapture 
models allowing for varying list dependence structures has not been possible. 



1. Introduction 

Variable selection is an especially well-developed topic in statistics, as are meth- 
ods for incorporating spatial dependence in linear regression and other statistical 
models. The combination of variable selection with spatial statistics, however, has 
been left relatively unexplored. In this paper, we unite spatial statistics with vari- 
able selection in a broad subset of generalized linear models (GLMs) by allowing 
each region in an areal model to have its own regression model, i.e. each region's 
model includes different covariates. We achieve spatial information sharing in vari- 
able selection via a prior that induces spatially smoothed inclusion probabilities. 
Our model for spatially smoothed inclusion probabilities (SSIP) embeds a condi- 
tionally autoregressive (CAR) prior on latent standard normal variables arising 
from the data-augmentation algorithm for probit regression. The probit regression 
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is itself latent, as the binary indicators specify whether a variable is included or 
excluded from each regional model. By combining this representation of covariate 
inclusion indicators with any conditionally normal representation of a likelihood, we 
achieve a fully conjugate Bayesian model with very straight-forward computation. 
Because many GLM likelihoods can be represented as conditionally normal using 
data augmentation methods, our approach can be used with many GLMs. We pro- 
pose a Markov Chain Monte Carlo (MCMC) algorithm for estimation, which also 
benefits from the full conjugacy of the model, as we are able to marginalize over the 
regression coefficients to obtain good mixing. Further, we are able to accommodate 
a richer spatial dependence structure than the one existing method, which requires 
that spatial dependence structures be representable on a lattice. As such, the ex- 
isting method is not generally applicable to areal models, whereas our method can 
be used with any areal model. 

The only direct precedent for this work is that of Smith & Fahrmeir ( 2007 ) . They 
present a spatial variable selection model for functional magnetic resonance imag- 
ine (fMRI) data that also allows for a different regression model at each location. 
Their model, however, requires that the spatial locations lay on a grid, a require- 
ment of the Ising model prior they employ (sec Higdon (1994) for an overview of 
the Ising model). Their model inherits the computational difficulties of the Ising 
model, and they rely upon approximations to the density and update their inclu- 
sion indicator variables using Metropolis-Hastings. Further, the parameters of their 
Ising prior, which control the degree of spatial smoothing, are initially suggested 
to be pre-specified and not estimated as part of the procedure. They offer a way 
to simultaneously estimate these parameters which requires gridding the space to 
approximate to the normalizing constant of the Ising prior. Lee et al. (2011) shows 
an application of this model. Scheel et al. (2012) presents a Bayesian Poisson hurdle 
model with Ising smoothed variable selection. Similar to |Smith fc Fahrmeir (2007), 
they use an Ising prior for inclusion probability smoothing. Computation for this 
model is also cumbersome, as many of the parameters cannot be sampled from 
known distributions. They rely upon an assortment of methods from the adaptive 
Metropolis algorithm ( Roberts fc Rosenthal| ( |2006| ) to a reversible-jump sampling 
scheme ( |Green| ( |T994[ )). 

Some attention has been paid to spatial variable selection in the genetics liter- 
ature. Stingo et al. (20111, for example, model phenotypic outcomes given geno- 
tytpes and associated effects on biochemical pathways. They incorporate gene 
location information in formulating their variable inclusion probability prior. They 
use a Markov random field (MRF) for the location structure and specify that the 
probability that a gene is included in the model is dependent upon its neighbors' 
inclusion. In this case, there is only one regression model, as opposed to our vary- 
ing site-specific regression models. Neighboring genes on a chromosome then have 
smoothed probability of being included in the regression model. |Vannucci fc Stingo] 
(2011) offers an in-depth review of similar methodologies within the genetics liter- 
ature. 



|Reich et al. (20101 presents a model for variable selection for spatially varying 
coefficient regression. Here again, there is only one regression model for all (point- 
referenced) locations, and they select whether each variable should be excluded, 
included with a globally constant coefficient, or included with a spatially varying 
coefficient. 
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The use of a latent probit regression to inform prior inclusion probabilities is 
similar to the model selection approach of Quintana & Conti (2012 1. 

The rest of this article is organized as follows. Section [2] introduces the gen- 
eral regression framework for spatially varying models. Section |2.2| introduces a 
latent variable model for spatially smoothed binary indicators; Section [3] describes 
how to incorporate these indicators into a normal linear regression framework to 
allow spatially smoothed inclusion probabilities with a gaussian likelihood. Sub- 
sections of Section [3] describe the MCMC estimation algorithm, present a simple 
simulation example, and apply the normal model to election results data. Section 
[4] presents a model for negative-binomial regression with the SSIP prior, and dis- 
cusses the general applicability of the SSIP model to a number of other GLMs. 
Subsections of Section [4] propose an estimation algorithm that takes advantage of 
the conditionally normal representation of the negative binomial distribution using 
Polya-gamma latent variables, describes how negative binomial regression with the 
SSIP are a perfect candidate for spatial capture- recapture (CRC), presents a simu- 
lation example that shows the increased estimation power gained by leveraging the 
information contained in all spatial regions simultaneously in CRC, and analyzes 
a data set of people killed in Casanare, Colombia, aiming to estimate the total 
number of unaccounted for murders in the region. Section [5] concludes and offers 
suggestions for future directions. 



2. Generalized linear model with a SSIP prior 

2.1. Model. We are interested in developing a method to allow for regionally vary- 
ing, spatially dependent regression models. For simplicity, we employ a generalized 
linear regression model for each region in which, 



(1) 



E[Y i ]=g(XTf3 i ), 



where represents the vector of outcomes in region i. is the m,x p design ma- 
trix associated with region i, and g is the link function. See |Nelder fe Wedderburn| 
( 1972 ) for more details about generalized linear models. Borrowing of information 
for the regression parameters is permitted through a common prior on each of the 
/3j = [Pn-.-Pip] vectors of regression coefficients. We employ the spike and slab prior 
of |Lempers| ( |1971[ ), pitchell fc Beauchamp| ( [19881 ) , |Smith fc Kohn| (|1996[), and sim 



ilar to that used in stochastic search variable selection as in George fc McCulloch 
(1993) (among others) for ^ for each j G {1, ...,p}. That is 



(2) /% ~ ii 3 N(ii v Tf) + (1 - 7wMoC8«). 

where 8o(/3ij) is the Dirac delta function evaluated at j3ij. For any j, conditional 
on 7,j = 1, the f3ij share a common prior, resulting in sharing of information 
across regions in estimating the ftj 's. Put another way, the model has area-specific 
random effects. Conditional on 7,^ = 0, has a degenerate distribution at zero, 
so effectively the jth predictor is not included in region i's model. 

The unique feature of the model presented here is that we incorporate spatial 
structure in the region-specific variable inclusion indicators 7,^, inducing sharing 
of information between nearby regions in variable selection. To induce spatial 
dependence in the Ty's, we specify an independent spatially smoothed inclusion 
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probabilities (SSIP) prior for each covariate inclusion indicator: 
(3) lij = l[Z ij >Q] 

Z l0 ~ N(p-Y^Z k ,l), 



where rii are the number of neighbors of region i, and the sum is taken across all 
neighbors of region i, as denoted by the ~ relation. By specifying the distribution of 
7i conditional on the latent of its neighbors, we create spatial dependence in the 
region-specific models. In other words, if covariate j is included in the regression 
models for regions that are neighbors of region i, then covariate j is more likely 
to be included in the regression model for region i. The following section includes 
further discussion of this prior and its properties. 

To complete this model, we place a iV(/zo,so) prior on the /ijS and an inverse- 
Gamma IG(a t , b t ) prior on the t?s. 

2.2. Properties of the SSIP prior. Let 7 ,*j be a binary variable corresponding 
to region i which indicates whether the jth covariate is included in the model. Fol- 
lowing the data augmentation approach of Albert and Chib, we can represent each 
jij as a latent Gaussian variable with the binary outcome arising through thresh- 
olding, i.e. 7y = t[Zij > 0] and Zij ~ N(p,ij,Tj). Marginal of Zy, 7i j is Bernoulli 
distributed; the area-specific mean (Hj controls the parameter of the Bernoulli dis- 
tribution. To induce spatial dependence, we place a intrinsically autoregressive 



(IAR) prior on the latent Z t] ( |Besag_et al.| ( |l99l[ )), Z r] -\Z_ {l]j ~ N(p± £ fc ^. Z k , i) 
making the model that of equation [3] 

To understand the implications of this prior on the conditional inclusion prob- 
abilities, we notice that marginal of the latent variable Zij, Pr^ij = l\Z_^j) — 
^(p^f Efc~i^)' where $ represents the standard normal cumulative distribu- 
tion function. Intuition about the dynamics of this function is perhaps more easily 
gained if this probability is re-expressed as Pr{^ij = l\Z_^j) = Pr(Zij > 0) where 
Z^ ~ N(p^- X)fc~i Zk, )■ Holding the average number of neighbors that have the 
jth covariate included constant, as the number of neighbors increases, the prob- 
ability that the jth covariate is included in the Jth model increases if more than 
half of the neighboring models include it and decreases if more than half of the 
neighbors exclude it. If exactly half of the 7fcj's for region i's neighbors are one, 
the conditional prior inclusion probability remains one-half. Holding the number of 
neighbors constant, as the proportion of region i's neighbors that include the jth 
covariate increases, so too does the conditional prior inclusion probability. 



If p is not restricted to be one (as in the CAR model of Bcsag (1974)), the joint 
distribution of Z^ exists and Zj ~ N(0, S) where E" 1 = D — pW, D is diagonal, 
Da — fii, and W is an adjacency matrix. Under this joint model, marginally 
Z,^ ~ A^O, and the marginal prior probability of inclusion for each covariate 



is one-half, as the marginal mean is zero. As discussed in Ley & Steel (2009), this 
has implications for the prior model size; a strategy for relaxing this assumption is 
discussed in Section [5] If p is set to be zero, this model collapses to independent 
spike and slab priors for each region. That is, when p is zero, the Tij are independent 
and imply that jij are independent with prior probability, Pr(~yij = 1) = 5. 

Under this model, despite having omitted an explicitly spatial distribution on 
the normal component of the distribution of (3j of equation [2j there is spatial 
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smoothing of /3- in expectation: E[j3ij] = (1 — Pr(jij = As Pr(7ij = 1) 

is a spatially smoothed surface, so too is this expectation. An explicitly spatial 
distribution for the normal component of the beta distribution is also possible. For 
example, one could place an IAR or CAR prior on the subset of the (3jS for which 
'Yj is one- that is, one could remove the rows and columns of the adjacency matrix 
for which 7^ = 0. A similar approach to this, without any variable selection, is 



taken in Assungao ( 2003| . The implications of this on the spatial structure could be 



unexpected, as certain regions would not be included in the spatial smoothing (those 
where 7y = 0), and one could produce independent islands in the spatial surface. 
That is, there might be regions completely separated due to 7 that are spatially 
quite close but modeled as spatially independent. More unpredictable still is the 
spatial structure implied by taking a sub-set of the /3^-s where sub-regions are not 
entirely partitioned off, but rather those j3ij that are excluded in the spatial model 
are spaced irregularly. For this reason, we are satisfied with spatial smoothing in 
expectation and specify an iid prior for the normal mixture component. 

3. Linear regression with Gaussian error term and the SSIP 

As a first example of the SSIP, we consider a special case of the generalized linear 
model presented above. Here, we specify a linear regression with Gaussian error 
term and identity link function. In this case, 

(4) Y i = X i f3 i + e i , 

where 6j ~ N(0,a 2 I) has the standard iid error assumption. In this case, the 
algorithm for estimation is straight-forward. 

3.1. Algorithm. We use MCMC to fit this nearly fully-conjugate model. In the 
case where p = 1, the model is fully conjugate and all parameters can be updated 
from known distributions. If p is treated as a parameter to be estimated, it is the 
only parameter in the model that lacks a full conditional distribution with known 
closed form. It can be sampled easily using the Metropolis-Hastings algorithm (see 



Chib & Greenberg (1995)). The algorithm proceeds as follows. For each i 

2 



For each j, sample [7^, Zy|Z_[ i ] J -,7_ [i]j -,/i 3 -,T?, of] marginally of f3 t : 
— First sample 7^ marginally of Zif 7y ~ Bernoulli(pij), where 

w^(Y i \ lij = 1) 

Pij — 



*{Y| 7<j = l) + (l-w ij )V(Y\ lij =0) : 



w 



Wij = *(p^t J2k~i Z k), and *( Y l7y = 1) is tne marginal likelihood 
of the model that includes the jth covariate with all the others variables 
included/excluded according to the current value of Tr—iji ■ *(Y|7y = 
0) is denned analogously. W is the result of analytically integrating out 
(3 i and is available in closed form. 

Sample [Zij\jij]: Z tJ - N ni] [p^ J2k~i Z kj> ^r) , where JVh„ is a nor- 
mal distribution truncated to the region Qy, fijj = (—00, 0] if 7^ = 0, 
and Qij = [0,oo) if 7^ = 1. 



21 



Jointly update the ith region's regression coefficients: \J3 i \'y i ,iJi,T ,cr, 
N(M~ 1 m. l , M" 1 ), where M, = ^ + Diag(r- 2 ) and m, = 
/i T Diag(-r~ 2 ). 
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• Update [jj\7 i ,f3 i ]~Ga,mm&(a + m i /2,b+(Y i -X i l3 i ) T (Y i -X i p i )/2). 

• Update \pj\Pij, /3kj, A*, Sq] and [rj | Pij , /3kj,a t ,b t ] from normal and inverse- 
gamma distributions respectively. 



3.2. Simulation Example. We present a simple simulation example to demon- 
strate the spatial smoothing of regional model covariate inclusion probabilities 
achieved with SSIP. We simulate a different regression model for each region on 
a three by three grid, with up to two covariates plus the intercept included in each 
model. For each region, we draw a Beta distributed random variable and order 
them such that the lower numbered regions have the smaller random draws. We 
then sample the 7 inclusion indicators with Pr("fi = 1) = The result is that 
lower numbered regions tend to have fewer (or no) covariates included, whereas 
higher numbered regions do. We repeat this process for each of the two potential 
covariates to be included in this model. The X covariate matrix for each region is 
created by sampling iid uniform random variables, and the regression coefficients 
that have been determined to be included in the previous step are sampled as iid 
N(5, 0.25) (first coefficient) and 7V(3, 1) (second coefficient). For each region, we 
sample four observations from the specified regression model; the variance of the 
error term is one. 

We run the normal SSIP regression model for 10,000 iterations and compare the 
results to those obtained by fitting independent regional models using Bayesian 
Model Averaging (Hoeting et al. (1999) gives a thorough review) as implemented 
in the R package BMA ( |Raftery et al.| ( |2012| )) and the AIC. In the case of the AIC, 
inclusion probabilities are not relevant; we select the single model with the lowest 
AIC. Throughout this article, we will refer to these points of comparison as BMA 
and AIC. 

Figure [I] shows the inclusion probabilities ranging from one (white) to zero 
(black) for each region in the simulation for each of the methods in this comparison. 
The left-most four plots show the inclusion probabilities for the first covariate. The 
second covariate is shown on the right. Based only upon a visual inspection, we 
find that the SSIP is better able to identify the correct model in each region by 
borrowing information across neighboring regions. BMA and AIC both indicate 
that the top middle region should exclude the first covariate. However, because all 
of the neighboring regions include it, the SSIP is able to smooth over them and 
identify the covariate as likely included in that region's model. 

The second covariate is less precisely determined to be present or absent by 
all methods due to its lower signal to noise ratio. Again, based upon a visual 
inspection of these plots, it appears that the SSIP model more accurately captures 
the pattern of variable inclusion present in this simulation. Table [l] shows the mean 
squared error between the coefficients' true value and estimated value. The normal 
regression model with SSIP produces mean squared errors significantly lower than 
the other methods under study. 

These results provide some indication of the advantages of the SSIP method. 
However the simulation is designed to produce data with high spatial dependence 
in the area-specific models. It is the only model under consideration which borrows 
information across all regions. As there are so few observations in each region, it is 
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Figure 1 . Posterior inclusion probabilities for the first covariate 
(left) and second covariate (right). 

True Model SSIP 



AlC BMA 



True Model SSIP 




AlC BMA 




Table 1. Mean squared error of regression coefficients for the 
SSIP, AIC, and BMA 



SSIP 


BMA 


AIC 


MSE 0.3 


2.9 


8.0 



unsurprising that single-region estimates have high variance and thus difficulty in 
appropriately identifying regional models. 

3.3. Election Data. We consider a data set assembled from the website of the 
Census Bureau (http://www.census.gov). Our dependent variable is the percent 
of the population that voted Democrat in the 2008 presidential election for every 
county in the lower 48 states of the United States. As explanatory covariates, we 
consider the percent of the population that is black, the percent that qualifies as 
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earning income below the poverty line, and the unemployment rate by county. The 
regression model for election outcomes given demographic covariates is of some 
practical interest since it provides insight into the factors that influence voting 
decisions and how these demographic factors differ regionally. For example, the 
conclusion that poverty and race may be important predictors of voting patterns in 
some regions and unimportant in others is relevant both for understanding voting 
results ex post as well as for political strategists who wish to tailor campaign ma- 
terials to specific states or regions. The superior performance of the SSIP method 
on these data (see below) suggest its usefulness in political science and economics 
applications. Although in most cases the data would be richer and more complex 
than in this example, the relative ease with which SSIP scales to larger dimensions 
presages improved performance on larger datasets. 

We fit the standard linear regression model with spatially smoothed inclusion 
probabilities as outlined in section |3T| running the algorithm for 50,000 iterations. 
As above, we compare the results of our analysis to those obtained by using BMA 
and the AIC model selection criterion. Results of this analysis are presented in 
Figure [2j By using the SSIP, we are able to discern interpretable cultural regions 
in which different factors are important in explaining the percent of the population 
that voted Democrat in 2008. For example, focusing on the Percent Black variable, 
we find that this is a useful predictor (i.e. has high inclusion probability) throughout 
the South, Mid- West, and Northeast. Both the BMA and AIC methods fail to 
capture this interpretable regional pattern. Both the SSIP and BMA find that 
the percent poverty is included in the models of the Gulf Coast and the Great 
Lakes region. The AIC criterion includes this variable more widely, but a spatial 
pattern (and corresponding regional interpretation) is noticeably absent. Lastly, 
the unemployment rate exhibits an interesting spatial pattern of inclusion. East of 
the Great Plains, there is low probability of inclusion, with the notable exception 
of North Carolina. The Great Plains region shows a high probability of inclusion, 
and the region to the west is ambiguous as to whether it should be included or not. 
Both BMA and the SSIP roughly identify this Great Plains pattern, though the 
SSIP results in greater spatial smoothing, regional separation, and interpret ability. 



4. Negative binomial regression using Polya-Gamma latent variable 

representation with ssip 

Any regression model that admits a conditionally normal representation can 
easily incorporate the SSIP model for variable selection and maintain conjugate 
updates. Examples of such models include (Multinomial) Probit regression as in 
Albert & Chib (|1993 ), Logistic regression as in Poison et al. (2012) and quantile 



regression as m iKozumi & Kobayashil (I2011I) andlLum & Gelfandl (120121), among 



others. In fact, any linear model can incorporate the SSIP. If a marginal likelihood 
is not easily estimated or available in closed form, however, one would have to 
condition on (3 iy which would result in slower mixing. Sampling of /3 i would be 
subject to the same difficulties inherent in the non-conditionally Gaussian model 
without the SSIP prior. Here, we present an example of incorporating the SSIP 
into negative binomial regression using the Polya-Gamma distribution introduced 



in the recent work of Poison et al. (2012). 
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Figure 2. Posterior inclusion probabilities of each variable in 
the election data example. 
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Percent Black AIC 
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Percent Poverty BMA 



Percent Poverty AIC 




Unemployment Rate Unemployment Rate BMA Unemployment Rate AIC 




We take advantage of the conditionally conjugate representation of the nega- 



tive binomial distribution presented in Poison et al. (2012) wherein they use the 



standard model for over-dispersed count data, 



PoisiK) 



Xi\h,pi ~ Ga(h, 



1 - Pi 



Marginalizing over A,, 



(5) Yi~NB{h,pi). 

|Polson et al. (2012) introduce a latent variable representation of the nega- 
tive binomial distribution for linear regression, which accommodates fully conju- 
gate updates for both the latent variables and the regression coefficients. They 
show that the negative binomial likelihood can be represented by an integral, 
p(Yi) oc e^5— V>i J g-uiipi / 2 p(oji)duJi, where p(tJ^) is the Polya-Gamma distribution 
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with parameter h + Yi and no tilting parameter. From this expression, it is easy to 
see that conditional upon the latent cu i: p(Y i \uj i ) oc e - ^^'. That is, conditionally, 
the likelihood of Yi is quadratic in ipi, which we set equal to ipi = Xf f3. 
Using this representation, we model areal count data as follows: 



1 



Here, Y^ is the vector of counts in region i, and is the vector of corresponding 
parameters for the negative binomial distribution. We place the SSIP prior on (3 
as m Section [2 



4.1. Algorithm. Following Poison et al. (2012), we take advantage of the con- 



jugacy of the Polya-Gamma representation of the negative binomial distribution, 
resulting in another fully-conjugate algorithm. The MCMC algorithm for fitting 
this model proceeds as follows: 

• Sample all latent u> random variables from u)i ~ PG(Yi + h, Xf 

• Set d = ^T- Notice that d ~ N(Xj ft, diag^- 1 )). 



• Proceed as in section 3.1 replacing Y^ of section [3T] with 

— For each j, 

* Sample 7y|Z_[jj] ~ Bernoulli^,,), with rjij oc ^(CJwi)wij. 

* Sample Zij\jij, Z_ [tj] - N yij {pJ2k~i z kv "i 1 )- 

— Jointly update fl^-fi. A*, r 2 . 

— Update /x and t 2 . 

4.2. Capture-recapture. Capture-recapture (CRC) is a natural application of 
negative binomial regression with SSIP. CRC seeks to estimate the total size of 
a population based upon multiple sub-samples of the population and the overlaps 
among them. Let there be fc = {1,...,K} sub-samples (or lists) collected. Then, 
the data consists of counts of the number of individuals that appear on each list 
intersection. We define a list intersection as a binary string, d\...dK, where dk = 1 
for an intersection including list k and otherwise. Y& X .„& K is the number of indi- 
viduals who appear in intersection di...dx- This is perhaps most easily explained 
by example. Y"ooi is the number of elements that appear only on list three when 
K = 3; Yioio is the number of elements that appear on lists one and three but do 



not appear on lists two and four for K = 4. The traditional method (see Fienberg 



(1972)) then fits a Poisson regression to the above counts, 



Y dl ...d K ~ Pois(^ dl ... dK ) 
(6) \og(fi dl ... dK ) = a + /3xdi + .../3 K d K . 

The number of elements seen on no list (Y"o...o) is then estimated as e a . As spec- 
ified above, this model assumes that the lists are collected independently. Incorpo- 
rating list dependence is as simple as including interaction terms to the log-linear 



regression, as first pointed out by Fienberg (1972 1. For example, to account for 



dependence between list one and two, one would also include findxdi in Equation 

m 
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CRC generally relies upon the assumption that all members of the population 
have equal probability of being sampled by each list. When it is unreasonable 
to believe that this is true, stratification is often done to break the samples into 
sub-populations within which it is plausible that the probability of capture for 



each member of the sub-population is approximately equal by list (Bruno et al. 
( |1994[ )). One variable on which to stratify is spatial location. Particularly in human 
applications of population estimation, in which people in different geographic or 
administrative regions may have differing probability of being reported to each data 
collecting agency, it is sensible to subdivide the population spatially. Additionally, 
it is also often of interest to have separate estimates by region, making spatial 
stratification useful both as a tool to satisfy the model's assumptions and as a 
way to achieve a higher resolution analysis of the spatial distribution of unreported 
individuals. However, the degree to which stratification is possible is dependent 
upon the amount of data available. If one stratifies too much, each stratum is 
left with so little data pertaining to it that independently obtaining reasonable 
estimates for individual strata is impossible. 

Estimation of these models is further complicated by the fact the there is often 
little information a priori about which model ought to be used, i.e. which interac- 
tion indicators between lists should be included. However, it is logical that spatially 
proximate regions should have similar models, as collection agencies may operate 
similarly regionally and sentiment that would result in a population trusting an 
agency with its data may also exhibit a spatial patterns. This makes capture- 
recapture a perfect candidate for using SSIP, which will allow for finer spatial 
stratification through borrowing of information across space, both in terms of the 
mean estimates and informing about variable inclusion probabilities. 

Recent work of |Royle &: Young (2008) and Marques et al. (2012) also presents a 
spatial capture-recapture model. Royle & Young (2008) considers a continuous spa- 
tial resolution (often not available in human applications due to privacy concerns) 
in which they assume no list dependence. In our case, list dependence is absolutely 
required, and in fact, varying spatial models are necessary as well. The spatial 
aspect of this work is due to the fact that animals may move during the course of 
the study. As we are considering a retrospective study of violence, victims can only 
be killed at one place in time and space thus making these models inappropriate in 
this context. 



4.3. Simulation Example. We consider a case in which there are four lists (K=4) 
which sample individuals that exist at spatial locations contained in S = [0, l] 2 . 
Let the intensity surface for the point distribution, A(s), be given by A(s) = 

c — | |s — .5| |J . Figure ?? shows this intensity surface and one realization of the 

locations of individuals sampled from this process. Our first simulated list can de- 
tect an event at location s = {x, y} with probability Pr(Li(x, y) = 1) = §(# + y) 3 - 
That is, for each location generated by A(s), it is included in L\ with spatially 
varying probability described by a cubic in x and y. In order to induce list depen- 
dence in some spatially adjacent regions, list two samples an event with prob- 
ability Pr(L 2 (x,y) = l\Lx{x,y)) = ^y 3 + ^ + §l[Zi(z,y) = l]l[x < 0.4]. 
We make a similar conditional statement for L 3 ; Pr(L 3 (x,y) = l\Li(x,y)) = 
Y^x 2 + jq + \^-{L\{x,y) — l\l[x > 0.4]. Ergo, on the left-hand side of the space, 
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those events that were detected by list one have a higher probability of being de- 
tected by list two, whereas on the right-hand side of the space, there is positive list 
dependence between list one and list three. List four is generated independently 
of all other lists, Pr(Li4(x,y) = 1) = j^x 2 + j^. In summary, we sample locations 
according to a non-homogeneous Poisson process in continuous space. At each of 
these locations, each of the four lists detects individuals with the spatially varying 
probabilities and spatially varying dependence. This structure does not precisely 
simulate from our negative binomial regression model, though it does induce the 
properties that exist in real applications and which our model addresses. We dis- 
cretize the space into a 5 x 5 grid of cells to create the areal units. 

We fit the negative binomial regression model with SSIP as shown above with 
one small difference: we place a CAR prior on the regional intercepts. That is, we 

specify that = ^p^'+^/^J . ^ wnere x.; are the intersection indicators as above 

and Oii ~ CAR(r a ). Even with this added complexity we retain fully conjugate 
updates for all parameters. 

We take a Bayesian model averaging approach to estimating the total number of 
un-sampled individua ls. That is, for region i, we use the "model average" estimate 
(Raftery et al. (19971), di = jj J2m=i a i"^' i- e - the posterior mean estimate over 
all iterations of ati averaged over all of the sampled models. By this notation, we 
mean that we run the algorithm for M iterations, m indexes iteration, and a-™' is 
the sampled value at iteration m of on . 

To investigate what is gained by including both spatial smoothing of the response 
surface and spatial smoothing of the inclusion probabilities, we select implementa- 
tions of models for comparison that include some of these features. Our first two 
points of comparison are independently fit BMA and AIC estimates as before. An 



additional point of comparison is a fitting a spatial model with R-INLA (Rue & 



Martino (2009)), an implementation of nested Laplace approximation (Rue et al. 



(2009)). Using R-INLA, we estimate a model with a spatial IAR intercept and 
common iid normal prior for each of the other regression coefficients. Essentially, 
we use the same model as we will fit with the SSIP but the model does not vary 
regionally. We include all covariates and let the random effects determine values 
for each of them. Ideally, this would be able to estimate non-existent effects as zero 
coefficients. 

Table [2] shows the results of this experiment. We find that if we do not share 
information among all of the strata, there are many regions for which no reasonable 
estimate can be produced. This is demonstrated by the fact that in the methods 
that do not leverage the shared information, there are several estimates with infinite 
confidence intervals. In these intervals, some of estimates were on the order of 10 13 . 
By comparing our method to the R-INLA implementation, we see the added value 
of allowing different models by region. Our method has smaller credible intervals 
than those produced by the other models. In addition, our model has empirical 
coverage rate approximately equal to the nominal coverage rate of 0.95, it has the 
lowest root mean squared error as applied to the difference between the true number 
of unsampled individuals and the estimated number, and the mean median error is 
also the lowest among the methods considered. 
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Figure [3] illustrates this point by showing a comparison of the true counts to the 
estimated counts produced by each method. Our method produces a correlation be- 
tween the true and estimated values of 0.71, whereas the next best method achieves 
only 0.61. Figure [4] shows the inclusion probabilities for the interaction between list 
one and list two. Figure [5] shows the same for the interaction between list one and 
list three. It should be noted that the AIC criterion performs admirably in terms 
of variable selection. However, it fails to produce reasonable population estimates 
(and also produces infinite confidence intervals) in many regions. BMA appears 
to have less success at identifying the correct regional models, though all of the 
estimates in this particular simulation are of a reasonable order of magnitude and 
have finite credible intervals. The SSIP is more successful than BMA at identifying 
the regional model pattern and less successful than the AIC. Again, this drawback 
of the SSIP is outweighed by its superior performance in terms of predicting the 
number of uncounted individuals and smaller and more accurate credible intervals. 

Table 2. Results of MSE simulation example. 



True Count 


SSIP 


BMA 


INLA 


AIC 


33 


[1 


, 32 ] 


[ , 12 ] 


[4 


,48] 


[ , Inf ] 


50 


[4 


,75] 


[ 1 , 10 ] 


[10 


,81] 


[0,lnf] 


62 


[5 


, 93 ] 


[ 1 , 21 ] 


[ 13 


, ioo ] 


[0,lnf] 


51 


[8 


, 95 ] 


[ , Inf ] 


[ 14 


, ioi ] 


[ 6 , 66 ] 


21 


[1 


, 40 ] 


[1,1] 


[6 


, 60 ] 


[ , Inf ] 


48 


[3 


,51] 


[ , 29 ] 


[9 


, 73 ] 


[ 3 , 107 ] 


95 


[16 


, 189 ] 


[ 3 , 772 ] 


[ 25 


, 152 ] 


[ 8 , 228 ] 


117 


[16 


, 135 ] 


[ 11 , 183 ] 


[ 24 


, 137 ] 


[ 14 , 108 ] 


73 


[ 20 


, 165 ] 


[ 11 , 457 ] 


[ 25 


, 142 ] 


[ 21 , 574 ] 


43 


[5 


, 67 ] 


[ 1 , 85 ] 


[11 


, 84 ] 


[ 4 , 109 ] 


69 


[14 


, 176 ] 


[ 7 , 1094 ] 


[ 24 


, 159 ] 


[ , Inf ] 


87 


[ 23 


, 228 ] 


[ 16 , 745 ] 


[ 33 


, 191 ] 


[ 39 , 508 ] 


104 


[ 25 


, 203 ] 


[ 9 , 234 ] 


[ 34 


, 179 ] 


[ 23 , 141 ] 


80 


[ 30 


, 182 ] 


[ 30 , 154 ] 


[ 41 


, 202 ] 


[ 33 , 128 ] 


45 


[14 


, H7 ] 


[ 9 , 238 ] 


[ 23 


, 137 ] 


[ 16 , 77 ] 


57 


[12 


, 137 ] 


[ 16 , 773 ] 


[ 21 


, 140 ] 


[ 28 , 630 ] 


74 


[ 28 


, 224 ] 


[ 28 , 326 ] 


[ 35 


, 192 ] 


[ 41 , 259 ] 


66 


[21 


, 154 ] 


[ 13 , 158 ] 


[ 33 


, 168 ] 


[ 14 , 82 ] 


42 


[ 24 


, 172 ] 


[ 14 , 154 ] 


[ 34 


, 172 ] 


[ 29 , 125 ] 


16 


[7 


, 78 ] 


[ 2 , 71 ] 


[ 19 


, H9 ] 


[8, 121] 


28 


[4 


, 76 ] 


[1,1] 


[11 


, 97] 


[ 7 , 566 ] 


47 


[10 


, 125 ] 


[ 7 , 502 ] 


[ 20 


, 126 ] 


[ , Inf ] 


36 


[10 


,81] 


[ 4 , 38 ] 


[ 20 


, H4 ] 


[ 10 , 46 ] 


25 


[13 


, 132 ] 


[ 18 , 772 ] 


[ 21 


, 137 ] 


[ , Inf ] 


4 


[4 


,57] 


[ 1 , 98 ] 


[11 


, 93] 


[ 5 , 1507 ] 


Empirical Coverage 





.96 


0.72 





.92 


0.64 


RMSE 


107 


190 


113 


660177792443 


RMSE - no outliers 


107 


190 


113 


190 


Mean Median Difference 




16 


30 


18 


45103972194 
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FIGURE 3. Comparison of fitted values for synthetic MSE example. 
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4.4. Application: Estimating the number of killings in Casanare, Colom- 
bia. CRC is particularly appropriate for an application in the field of human rights 
because the total number of abuses is often under-reported- whether it is due to 
deliberate attempts by the perpetrators to keep their crimes hidden, fear of ret- 
ribution by family members who would normally report such crimes, lack of an 
infrastructure for reliable documentation, or any of numerous other reasons. In 
studies of violence, it is often not the case that the lists are independent. For ex- 
ample, one group documenting violations may have a policy of referring victims 
or family members of victims to another agency. Despite the fact that we know 
it is probable that there are list interactions, it is very difficult to specify a model 
with the appropriate interaction terms included a priori. As data collecting agen- 
cies may operate differently regionally and are trusted with varying degree, it is 
appropriate to allow the model to vary by location. 
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Figure 4. Comparison of inclusion probabilities for list interac- 
tion /?i2 by region. 
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Figure 5. Comparison of inclusion probabilities for list interac- 
tion /3i3 by region. 
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We analyze a data set of the number of individuals killed in Casanare, Colombia 
between 2001-200^3 

a time period during which the region was in the midst of a 
bloody conflict involving battles between paramilitaries, guerrillas, and the Colom- 
bian military ( |Guberek et al. (2010)). In prior analyses of this data, it was not 
possible to estimate the number of undocumented killings in each municipality and 
each year separately because of the sparsity of the data collected. In one analysis, 



Guzman et al. (2007) aggregated all of the southern municipalities (Sabanalarga, 



Villanueva, Monterrey, Aguazul, Tauramena, Mam, Chameza, Recetor and Yopal) 



This data is made freely available by Benetech's Human Rights Data Analysis Group 
(HRDAG) at https://www.hrdag.org/about/CasanareSummaries.html 
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to get an estimate of the combined total number of killings in the south for two 
nested time periods, 2001-2004 and 1998-2005. The later analyses of Guberek 



et al. (2010) and Lum et al. (2010) used slightly less regional aggregation, grouping 



the area into four regions: Center (Yopal and Augazul); Piedemonte (Sacama, La 
Salina, Tamara, Recetor, Chameza, and Nunchia); South(Tauramena, Monterrey, 
Villanueva, Mam', and Sabanalarga) ; and the Plains (Hato Corozal, Paz de Ariporo, 
Pore, San Luis de Palenque, Trinidad, and Orocue). They were, however, able to 
temporally disaggregate the analysis to produce yearly estimates. The necessity of 
the spatial aggregation was due to the inability of the available models to reliably 
produce estimates in a sparse data setting. 

In order to borrow information across years, we include a latent AR(1) shift 

in the intercept. In this case, p, f = cx p[ a '+Ct+ x i P t ] w here t, which indexes 

time, ranges from 2001 to 2004. For the temporal effect, (t, we place an AR(1) 
prior. Again, despite this added complexity, because of the conditional normality 
of the components in this model, both on and Q can easily be updated from known 
full conditional distributions (multivariate normal). For details, see Prado & West 
(2010). Notice that the region-specific model and parameters remain constant for 
all years within a region. Using our modified model, we obtain estimates of the 
number of unreported murders from 2000-2004 for each individual municipality and 
year. 

We include data from five administrative lists and include up to four-way inter- 
action terms as potential covariates, resulting in 31 potential covariates for each 
region. We enforce the restriction that main effects and an intercept must be 
present, leaving 20 covariates for the algorithm to include or exclude appropriately. 
The results of the SSIP are shown in Table |3] 

We find that from 2001 to 2004, there tended to be an increasing pattern of 
violence, reaching its peak in this time period in 2004. We also are able to see that 
the municipalities that experienced the greatest extent of undocumented killings 
were Aguazul and Yopal, neighboring regions in the middle of the state. These 
results are shown graphically in Fi gure |6) Table [4] provides a comparison of our 



results to those of Guberek et al. (20101, labeled "TCU" due to the title of the 



article. In this table, we have aggregated our results to match the regional aggre- 
gation of Guberek et al. (2010). The main points of interest in this case is that in 
almost all region/year strata, the credible intervals obtained from the SSIP contain 
the estimates from the previous analysis and vice-versa. So, despite permitting 
a much more discretized, detailed view of the pattern of undocumented killings, 
this analysis largely confirms the previous results obtained by large-scale spatial 
aggregation. 



5. Discussion and Future Directions 

We have introduced a variable selection prior for spatially varying regression 
that allows for regionally different models, while combining model information via 
spatially smoothed inclusion probabilities. In any situation in which the liklihood 
admits a conditionally normal representation, this prior results in a fully conjugate 
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FIGURE 6. Posterior mean estimate of the number of unreported 
killings by region (left) and the mean estimates as a percent of the 
total population (right). 

Table 3. Posterior mean and posterior credible intervals of the 
number of undocumented killings in Casanare, CO by municipality 
and year. 



2001 2002 2003 2004 



aguazul 


31 


[9 


, 77] 


39 


[8 


, 107] 


78 


[2 


, 29] 


180 [ , 6 ] 


chameza 


1 [ 


12 


,96] 


2 [ 


1 


16 ] 


3[ 


4 , 


66 ] 


7[ 


o,n] 


hato corozal 


1 [ 


24 


, 190] 


1 [ 


1 


20 ] 


2[ 


, 


5] 


5[ 


1 , 26] 


la salina 


1 [ 


57 


,437] 


1 [ 


3 


40 ] 


2[ 


, 


6] 


4[ 


1 , 19] 


mam 


6 [ 


, 


5] 


8 [ 


9 


91 ] 


16 


[0 


,12] 


37 


[ 1 , 23 ] 


monterrey 


6 [ 


, 


6] 


7 [ 





7] 


15 


[0 


, 26] 


34 


[ 3 , 46 ] 


nunchia 


2 [ 


o , 


12] 


2 [ 





8] 


4[ 


o , 


5] 


10 


[ 8 , 104 ] 


orocue 


1 [ 


1 , 


26 ] 


1 [ 





16] 


3[ 


o , 


6] 


7[ 


0,8] 


paz de ariporo 


5 [ 


o , 


2] 


6 [ 


1 


36 ] 


12 


[0 


,H] 


28 


[ , 10 ] 


pore 


3 [ 


o , 


3] 


4 [ 





4] 


8[ 


o , 


24 ] 


19 


[ 1 , 20 ] 


rccctor 


1 [ 


o , 


5] 


2 [ 





5] 


3[ 


o , 


3] 


7[ 


2 ,45] 


sabanalarga 


1 [ 


1 , 


10] 


2 [ 





10] 


3[ 


o , 


4] 


7[ 


3 ,31] 


sacama 


1 [ 


o , 


3] 


1 [ 


1 


23 ] 


2[ 


o , 


7] 


4[ 


3 , 38 ] 


san luis de palenque 


1 [ 


o , 


4] 


1 [ 


1 


14] 


2[ 


o , 


16 ] 


5[ 


7 , 76] 


tamara 


1 [ 


o , 


7] 


2 [ 


1 


17] 


3[ 


o , 


3] 


8[ 


18 , 172 ] 


tauramena 


6 [ 


o , 


16 ] 


8 [ 


2 


34 ] 


16 


[0 


,3] 


36 


[ 33 , 357 ] 


trinidad 


2 [ 


1 , 


19 ] 


3 [ 


6 


77 ] 


6[ 


o , 


6] 


14 


[ 42 , 448 ] 


villanueva 


9 


1 , 


24 ] 


11 


[0 


, 12 1 


23 


[1 


,14] 


52 


[ 85 , 899 ] 


yopal 


138 [ 


3 , 47 ] 


174 [ 


1 , 15] 


349 [ , 5 ] 


803 [ 200 , 2064 



model. We have demonstrated the ability of the SSIP prior improve model selec- 
tion relative to independent site-specific models in two simulation studies. With 
improved model selection, we have also shown the potential for improved parameter 
fitting and prediction in the simulation studies. We applied the SSIP prior to an 
election dataset and were better able to distinguish sensible regions in which each 
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Table 4. A comparison of the posterior estimates and credible in- 
tervals obtained by the Polya-Gamma regression model with SSIP 
versus a previous analysis (labeled TCU). 





2001 


2002 


2003 


2004 


D-SSIP 


174 [ 52 , 401 ] 


218 [ 66 , 503 ] 


437 


[ 132 , 1012 ] 


1007 [ 310 , 2317 ] 


E-SSIP 


7 [ 2 , 17 ] 


9 [3 , 21 ] 


18 [ 


6 , 41 ] 


42 [ 14 , 92 ] 


F -SSIP 


30 [ 11 , 66 ] 


38 [ 15 , 83 ] 


75 [ 


30 , 165 ] 


173 [ 71 , 375 ] 


G -SSIP 


14 [ 5 , 30 ] 


17 [ 6 , 37 ] 


35 [ 


14 , 73 ] 


79 [ 32 , 166 ] 


D-TCU 


399 [ 4 , 2198 ] 


365 [ 3 , 2105 ] 


73 [ 


, 205 ] 


334 [ 4 , 1871 ] 


E-TCU 


NA [ NA , NA ] 


4 [ , 16 ] 


107 


[ , 813 ] 


153 [ , 1268 ] 


F-TCU 


NA [ NA , NA ] 


NA [ NA , NA ] 


41 [ 


, 225 ] 


237 [ 2 , 1504 ] 


G-TCU 


21 [ , 96 ] 


20 [ , 82 ] 


131 


[ 1 , 892 ] 


108 [ 3 , 539 ] 



covariate is explanatory for the percent of the population that voted Democrat in 
the 2008 presidential election. We show how the SSIP prior can easily be used in a 
negative binomial regression for capture-recapture. We found that by allowing for 
regionally varying interaction terms to be included, we were able to reproduce pre- 
vious results that estimated aggregated regions separately, thus giving us a higher 
resolution (both spatially and temporally) understanding of the extent of violence 
in Casanare, Colombia during a period of human rights abuses from 2001-2004. 

In terms of future development of the SSIP, we believe it could be altered to ac- 
commodate the genetic applications such as those described in Stingo et al. (2011 ). 
The SSIP prior in its current form specifies a marginal prior probability of one-half 
for each covariate. Further work could be done to allow for varying marginal prior 
inclusion probabilities while maintaining spatial smoothing by placing a prior on 
the mean of a re-centered latent IAR process. The zero threshold would remain. 
For methodological development of the SSIP for CRC, many enhancements could 
be added to the model suggested above. For example, one could include an offset 
of the total population in each region to account for different-sized regions or al- 
low separate latent time-series by region. One might also investigate the extent to 
which spatio-temporal disaggregation is possible under this model. 
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